Search Result

Select

Detection method of domains generated by dictionary-based domain generation algorithm

ZHANG Yongbin, CHANG Wenxin, SUN Lianshan, ZHANG Hang

Journal of Computer Applications 2021, 41 (9): 2609-2614. DOI: 10.11772/j.issn.1001-9081.2020111837

Abstract （396）

PDF （893KB）（298）

Save

The composition of domain names generated by the dictionary-based Domain Generation Algorithm (DGA) is very similar to that of benign domain names and it is difficult to effectively detect them with the existing technology. To solve this problem, a detection model was proposed, namely CL (Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) network). The model includes three parts:character embedding layer, feature extraction layer and fully connected layer. Firstly, the characters of the input domain name were encoded by the character embedding layer. Then, the features of the domain name were extracted by connecting CNN and LSTM in serial way through the feature extraction layer. The n-grams features of the domain name were extracted by CNN and the extracted result were sent to LSTM to learn the context features between n-grams. Meanwhile, different combinations of CNNs and LSTMs were used to learn the features of n-grams with different lengths. Finally, the dictionary-based DGA domain names were classified and predicted by the fully connected layer according to the extracted features. Experimental results show that when the CNNs select the convolution kernel sizes of 3 and 4, the proposed model achives the best performance. In the four dictionary-based DGA family experiments, the accuracy of the CL model is improved by 2.20% compared with that of the CNN model. And with the increase of the number of sample families, the CL network model has a better stability.

Reference | Related Articles | Metrics

Select

Erasure code with low recovery-overhead in distributed storage systems

ZHANG Hang, LIU Shanzheng, TANG Dan, CAI Hongliang

Journal of Computer Applications 2020, 40 (10): 2942-2950. DOI: 10.11772/j.issn.1001-9081.2020010127

Abstract （393）

PDF （1250KB）（929）

Save

Erasure code technology is a typical data fault tolerance method in distributed storage systems. Compared with multi-copy technology, it can provide high data reliability with low storage overhead. However, the high repair cost limits the practical applications of erasure code technology. Aiming at problems of high repair cost, complex encoding and poor flexibility of existing erasure codes, a simple-encoding erasure code with low repair cost - Rotation Group Repairable Code (RGRC) was proposed. According to RGRC, multiple strips were combined into a strip set at first. After that, the association relationship between the strips was used to hierarchically rotate and encode the data blocks in the strip set to obtain the corresponding redundant blocks. RGRC greatly reduced the amount of data needed to be read and transmitted in the process of single-node repair, thus saving a lot of network bandwidth resources. Meanwhile, RGRC still retained high fault tolerance when solving the problem of high repair cost of a single node. And, in order to meet the different needs of distributed storage systems, RGRC was able to flexibly weigh the storage overhead and repair cost of the system. Comparison experiments were conducted on a distributed storage system, the experimental analysis shows that compared with RS (Reed-Solomon) codes, LRC (Locally Repairable Codes), basic-Pyramid, DLRC (Dynamic Local Reconstruction Codes), pLRC (proactive Locally Repairable Codes), GRC (Group Repairable Codes) and UFP-LRC (Unequal Failure Protection based Local Reconstruction Codes), RGRC can reduce the repair cost of single node repair by 14%-61% through adding a small amount of storage overhead, and reduces the repair time by 14%-58%.

Reference | Related Articles | Metrics

Select

Adaptive variable step-size blind source separation algorithm based on nonlinear principal component analysis

GU Fanglin ZHANG Hang LI Lunhui

Journal of Computer Applications 2013, 33 (05): 1233-1236. DOI: 10.3724/SP.J.1087.2013.01233

Abstract （753）

PDF （591KB）（674）

Save

The design of the step-size is crucial to the convergence rate of the Nonlinear Principle Component Analysis (NPCA) algorithm. However, the commonly used fixed step-size algorithm can hardly satisfy the convergence speed and estimation precision requirements simultaneously. To address this issue, the gradient-based adaptive step-size NPCA algorithm and optimal step-size NPCA algorithm were proposed to speed up the convergence rate and improve tracking ability. In particular, the optimal step-size NPCA algorithm linearly approximated the contrast function and figured out the optimal step-size currently. The optimal step-size NPCA algorithm utilized an adaptive step-size whose value was adjusted in sympathy with the value of the contrast function and free from any manual parameters. The simulation results show that the proposed adaptive step-size NPCA algorithms have faster convergence rate or better tracking ability in comparison with the fixed step-size NPCA algorithm when the estimation precisions are same. The convergence performance of the optimal step-size NPCA algorithm is superior to that of the gradient-based adaptive NPCA algorithm.

Reference | Related Articles | Metrics